智能论文笔记

Applicability limitations of differentiable full-reference image-quality

Siniukov Maksim , Dmitriy Kulikov , Dmitriy Vatolin

分类：计算机视觉

2022-12-11

Subjective image-quality measurement plays a critical role in the development of image-processing applications. The purpose of a visual-quality metric is to approximate the results of subjective assessment. In this regard, more and more metrics are under development, but little research has considered their limitations. This paper addresses that deficiency: we show how image preprocessing before compression can artificially increase the quality scores provided by the popular metrics DISTS, LPIPS, HaarPSI, and VIF as well as how these scores are inconsistent with subjective-quality scores. We propose a series of neural-network preprocessing models that increase DISTS by up to 34.5%, LPIPS by up to 36.8%, VIF by up to 98.0%, and HaarPSI by up to 22.6% in the case of JPEG-compressed images. A subjective comparison of preprocessed images showed that for most of the metrics we examined, visual quality drops or stays unchanged, limiting the applicability of these metrics.

translated by 谷歌翻译

Bit-depth enhancement detection for compressed video

Nickolay Safonov , Dmitriy Vatolin

分类：计算机视觉

2022-11-09

In recent years, display intensity and contrast have increased considerably. Many displays support high dynamic range (HDR) and 10-bit color depth. Since high bit-depth is an emerging technology, video content is still largely shot and transmitted with a bit depth of 8 bits or less per color component. Insufficient bit-depths produce distortions called false contours or banding, and they are visible on high contrast screens. To deal with such distortions, researchers have proposed algorithms for bit-depth enhancement (dequantization). Such techniques convert videos with low bit-depth (LBD) to videos with high bit-depth (HBD). The quality of converted LBD video, however, is usually lower than that of the original HBD video, and many consumers prefer to keep the original HBD versions. In this paper, we propose an algorithm to determine whether a video has undergone conversion before compression. This problem is complex; it involves detecting outcomes of different dequantization algorithms in the presence of compression that strongly affects the least-significant bits (LSBs) in the video frames. Our algorithm can detect bit-depth enhancement and demonstrates good generalization capability, as it is able to determine whether a video has undergone processing by dequantization algorithms absent from the training dataset.

translated by 谷歌翻译

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing

Yanjun Gao , Dmitriy Dligach , Timothy Miller , John Caskey , Brihat Sharma , Matthew M Churpek , Majid Afshar

分类：自然语言处理 | 人工智能

2022-09-29

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.

translated by 谷歌翻译

Stochastic approximation with decision-dependent distributions: asymptotic normality and optimality

Joshua Cutler , Mateo Díaz , Dmitriy Drusvyatskiy

分类：机器学习 | (统计)机器学习

2022-07-09

我们分析了一个随机近似算法的决策依赖性问题，其中算法沿迭代序列演变的数据分布。此类问题的主要示例出现在表演预测及其多人游戏扩展中。我们表明，在温和的假设下，算法的平均迭代和溶液之间的偏差在渐近正常上，协方差很好地解除了梯度噪声和分布移位的影响。此外，在H \'Ajek和Le Cam的工作中，我们表明该算法的渐近性能是本地最小的最佳选择。

translated by 谷歌翻译

DetIE: Multilingual Open Information Extraction Inspired by Object Detection

Michael Vasilkovsky , Anton Alekseev , Valentin Malykh , Ilya Shenbin , Elena Tutubalina , Dmitriy Salikhov , Mikhail Stepnov , Andrey Chertok , Sergey Nikolenko

分类：自然语言处理

2022-06-24

开放信息提取（OpenIE）的最先进的神经方法通常以自回旋或基于谓词的方式迭代地提取三重态（或元组），以免产生重复。在这项工作中，我们提出了一种可以平等或更成功的问题的不同方法。也就是说，我们提出了一种新型的单通道方法，用于开放式启发，该方法受到计算机视觉的对象检测算法的启发。我们使用基于双方匹配的订单不足损失，迫使独特的预测和用于序列标签的仅基于变压器的纯编码体系结构。与质量指标和推理时间相比，与标准基准的最新模型相比，提出的方法更快，并且表现出卓越或类似的性能。我们的模型在CARB上的新最新性能为OIE2016评估，而推断的速度比以前的最新状态更快。我们还在两种语言的零弹奏设置中评估了模型的多语言版本，并引入了一种生成合成多语言数据的策略，以微调每个特定语言的模型。在这种情况下，我们在多语言Re-OIE2016上显示了15％的性能提高，葡萄牙语和西班牙语的F1达到75％。代码和型号可在https://github.com/sberbank-ai/detie上找到。

translated by 谷歌翻译

Argumentative Text Generation in Economic Domain

Irina Fishcheva , Dmitriy Osadchiy , Klavdiya Bochenina , Evgeny Kotelnikov

分类：自然语言处理

2022-06-18

大型和超大语言模型的开发，例如GPT-3，T5，Switch Transformer，Ernie等，已经显着改善了文本生成的性能。该领域的重要研究方向之一是产生具有争论的文本。该问题的解决方案可以用于商务会议，政治辩论，对话系统，以准备学生论文。这些应用的主要领域之一是经济领域。俄罗斯语言的论证文本生成的关键问题是缺乏注释的论证语料库。在本文中，我们将论证的微观版，说服力论文和UKP句子语料库的翻译版本用于微调Rubert模型。此外，该模型用于通过论证注释经济新闻的语料库。然后使用带注释的语料库微调Rugpt-3模型，该模型生成参数文本。结果表明，与原始的Rugpt-3模型相比，这种方法将论点生成的准确性提高了20个百分点（63.2 \％vs. 42.5 \％）。

translated by 谷歌翻译

Wassersplines for Neural Vector Field--Controlled Animation

Paul Zhang , Dmitriy Smirnov , Justin Solomon

分类：人工智能

2022-01-28

大部分计算机生成的动画是通过用钻机来操纵网格创建的。尽管这种方法可以很好地对动物（例如动物）进行动画化的态度，但它的灵活性有限，可以使结构较低的自由形式对象进行动画化。我们介绍了WaseSplines，这是一种基于连续标准化流量和最佳运输的最新进展，用于对非结构化密度进行动画化的新型推理方法。关键思想是训练代表密钥帧之间运动的神经参数化速度场。然后，通过通过速度字段推进密钥帧来计算轨迹。我们解决了另一个Wasserstein Barycenter插值问题，以确保严格遵守关键框架。我们的工具可以通过各种基于PDE的正规化器来对轨迹进行风格化轨迹，从而创造出不同的视觉效果。我们在各种关键框架插值问题上演示了我们的工具，以制作时间连接动画而无需嵌入或索具。

translated by 谷歌翻译

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video

Dmitriy Serdyuk , Otavio Braga , Olivier Siohan

分类：计算机视觉 | 机器学习

2022-01-25

视听自动语音识别（AV-ASR）通过引入视频模式作为其他信息来源来扩展语音识别。在这项工作中，使用说话者嘴的运动中包含的信息用于增强音频功能。传统上，视频模式是通过3D卷积神经网络（例如VGG的3D版本）处理的。最近，图像变压器网络ARXIV：2010.11929展示了为图像分类任务提取丰富的视觉特征的能力。在这里，我们建议用视频变压器替换3D卷积以提取视觉特征。我们在YouTube视频的大规模语料库上训练基准和提议的模型。在YouTube视频的标记子集以及LRS3-TED公共语料库中评估了我们的方法的性能。我们最好的仅视频模型在YTDEV18上获得了34.9％的WER，而LRS3-TED则获得了19.3％，比我们的卷积基线获得了10％和9％的相对改善。在微调模型（1.6％WER）之后，我们实现了在LRS3-TED上进行视听识别的最先进的状态。此外，在一系列关于多人AV-ASR的实验中，我们在卷积视频前端获得了2％的平均相对降低。

translated by 谷歌翻译

Multiplayer Performative Prediction: Learning in Decision-Dependent Games

Adhyyan Narang , Evan Faulkner , Dmitriy Drusvyatskiy , Maryam Fazel , Lillian J. Ratliff

分类：机器学习

2022-01-10

学习问题通常表现出一个有趣的反馈机制，其中人口数据对竞争决策者的行为作出反应。本文为这种现象制定了一种新的游戏理论框架，称为多人执行预测。我们专注于两个不同的解决方案概念，即（i）表现稳定稳定的均衡和（ii）纳什均衡的比赛。后者均衡可以说是更具信息性的，但只有在游戏是单调时才有效地发现。我们表明，在温和的假设下，可以通过各种算法有效地发现所需稳定的均衡，包括重复再培训和重复（随机）梯度播放。然后，我们为游戏的强大单调性建立透明的充分条件，并使用它们开发用于查找纳什均衡的算法。我们研究了衍生免费方法和自适应梯度算法，其中每个玩家在学习其分发和梯度步骤的学习的分配和梯度步骤之间交替。合成和半合成数值实验说明了结果。

translated by 谷歌翻译

DeepCurrents: Learning Implicit Representations of Shapes with Boundaries

David Palmer , Dmitriy Smirnov , Stephanie Wang , Albert Chern , Justin Solomon

分类：计算机视觉

2021-11-17

最近的技术在将表面重建为由深神经网络参数化的学习函数（如签名距离字段）的级别集。但是，许多这些方法仅限于闭合表面，并且无法重建具有边界曲线的形状。我们提出了一种混合形状表示，其将明确的边界曲线与隐式学习内部结合起来。使用从几何测量理论中的机器，我们使用深网络参数化电流，并使用随机梯度下降来解决最小的表面问题。通过根据目标几何形状修改度量，例如，从网格或点云，我们可以使用这种方法来表示任意曲面，学习隐式定义的具有明确定义的边界曲线的形状。我们进一步展示了由边界曲线和潜在码共同参数化的形状的学习系列。

translated by 谷歌翻译